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Switching to Learn 

Shahin Shahrampour, Mohammad Amin Rahimian, Ali Jadbabaie* 


Abstract —A network of agents attempt to learn some unknown 
state of the world drawn by nature from a finite set. Agents 
observe private signals conditioned on the true state, and form 
beliefs about the unknown state accordingly. Each agent may 
face an identification problem in the sense that she cannot 
distinguish the truth in isolation. However, by communicating 
with each other, agents are able to benefit from side observations 
to learn the truth collectively. Unlike many distributed algorithms 
which rely on all-time communication protocols, we propose 
an efficient method by switching between Bayesian and non- 
Bayesian regimes. In this model, agents exchange information 
only when their private signals are not informative enough; 
thence, by switching between the two regimes, agents efficiently 
learn the truth using only a few rounds of communications. The 
proposed algorithm preserves learnability while incurring a lower 
communication cost. We also verify our theoretical findings by 
simulation examples. 

I. Introduction 

Distributed estimation, detection, and learning theory in 
networks have attracted much attention over the past decades 
ID, El, 0, 0, with applications that range from sensor 
and robotic networks 0, ID, 0, 0, 10 to social and 
economic networks m, HD, E2- In these scenarios, agents 
in a network need to learn the value of a parameter that 
they may not be able to infer on their own, but the global 
spread of information in the network provides them with 
adequate data to learn the truth collectively. As a result, 
agents iteratively exchange information with their neighbors. 
For instance, in distributed sensor and robotic networks, agents 
use local diffusion to augment their imperfect observations 
with information from their neighbors and achieve consensus 
and coordination ED, m. Similarly, agents exchange beliefs 
in social networks to benefit from each other’s observations 
and private information and learn the unknown state of the 
world fl5l . fl6l . 

Existing literature on distributed learning focuses mostly 
on environments where individuals communicate at every 
round. Of particular relevance to our discussion are a host 
of algorithms that follow the non-Bayesian learning scheme 
in ladbabaie et. al. m. In their seminal work, the authors 
propose an observational social learning model using purely 
local diffusions. At any round, each agent performs a Bayesian 
update based on her privately observed signal and uses a linear 
convex combination to incorporate her Bayesian posterior with 
beliefs of her neighbors and obtain a refined opinion. Inspired 
by ED, many algorithms are developed that either rely on 
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all-time communication protocols fl7l . Ifl8l , |fl9l , 120) or 
follow structured switching rules ED, El- For instance, in 
12H Shahrampour et. al. propose a scheme based on a gossip 
algorithm, and in lf22l Nedic et. al. present a method effective 
for switching topologies which respect a set of assumptions. 

The chief aim of this note is to consider a scenario where 
communication at any given time t occurs only if an agent’s 
belief does not change drastically due to her private obser¬ 
vation at that time t\ i.e. the agent’s private signal is not 
informative enough. Accordingly, an agent uses the Bayes’ 
rule to update her belief with every strong private signal that 
she observes; otherwise, she uses a non-Bayesian averaging 
rule to refine her opinion, by incorporating her neighbors’ 
observations and own private signals and in a non-Bayesian 
manner. 

Our contributions are as follows, we propose the total 
variation distance between the current belief of each agent 
and the Bayesian update after observing a given signal, as 
the criterion for characterization of informativeness. In par¬ 
ticular, a private signal is deemed informative, if the distance 
between the agent’s current belief and her Bayesian posterior 
given her private signal exceeds a preset threshold. Given 
the proposed criterion for informativeness, we implement a 
switching mechanism with agents shifting from Bayesian to 
non-Bayesian regime and vice versa. In the Bayesian regime, 
every agent uses the Bayes’ rule to update her belief based 
on her privately observed signal. In the non-Bayesian regime, 
the agents use an averaging rule to combine the observations 
communicated by their neighbors with their private signals. 
The challenge of analysis is due to the fact that the network 
topology becomes a function of signals, and does not evolve 
independently across time. Under some mild assumptions, we 
are able to show that by switching between the two regimes 
based on the informativeness of signals, agents can efficiently 
learn the truth. We further provide an asymptotic rate of 
convergence, and discuss the performance of the algorithm 
in numerical experiments. 

The remainder of this paper is organized as follows. The 
problem formulation and modeling details are set forth in 
SectionUIl The main results are presented in Section HIH where 
we begin by describing the characterization of informativeness 
and the proposed switching rules in Subsections IIII-AI and 
IIII-B1 respectively; followed by the convergence analysis in 
Subsection IIII-CI Simulation examples and discussions in 
Section [TV] illustrate the results. Concluding remarks and the 
future directions are provided in Section [V] All proofs are 
included in the appendix. 








II. Problem Formulation 

Notation: Throughout, R is the set of real numbers, N 
denotes the set of natural numbers, and W = NU{0}. For any 
fixed integer n £ N the set of integers { 1 , 2, ..., n} is denoted 
by [n], while any other set is represented by a calligraphic 
capital letter. The cardinality of a set X, which is the number 
of its elements, is denoted by \X\; and 'P(X) is the power-set 
of X, which is the set of all its subsets. Boldface letters denote 
random variables, and vectors are in column form. I n denotes 
the n x n identity matrix, 1 represents the vector of all ones, 
and 1 denotes the matrix transpose. 

A. The Model 


other words, there is no way for agent i to differentiate 9 from 
9 based only on her private signals. This follows from the fact 
that both 9 and 9 induce the same probability distribution on 
her sequence of observed i.i.d. signals. We, therefore, have the 
following characterization. 

Definition 1 (Observationally Equivalent States). For any 
9 £ 0 the set of states 9 £ 0 that are observation- 
ally equivalent to 9 for agent i are given by Oi(9) = 

To distinguish the true state of the world 6 from any false 
state 9 ^ 9, there must exist an agent that is able to detect 6 
as a false state, in which case it holds that ED 


Consider a set of n agents that are labeled by [n] and interact 
according to a weighted and directed graph Q = (\n\,£,P), 
where E C [n] x [n] is the set of edges and P £ R" xn is 
a symmetric doubly stochastic matrix. The ij- th entry of P, 
denoted by pij = [P]ij, assigns a positive weight to edge 
(i,j) if (i, j) £ £, and sets = 0 if (i, j) fL E . We further 
have [P\u > 0 for every i £ [n], i.e., all agents have positive 
self-reliant. Af(i) = {j £ [n]; (j,i) £ £, j f i} is called the 
neighborhood of agent i. 

The goal of each agent is to decide between one of the m 
possible states from the state space 0. A0 is the space of 
all probability measures on the set 0. A random variable 9 
is chosen randomly from 0 by nature and according to the 
probability measure v(-) £ A0, which satisfies u(9) > 0 for 
all 9 £ 0, and is referred to as the common prior. Associated 
with each agent i, Si is a finite set called the signal space of 
i, and given 9, £i(-\9) is a probability measure on Si, which 
is referred to as the signal structure or likelihood function of 
agent i. Furthermore, (fl, .P, P) is a probability triple, where 
f2 = ®^o(n,£[n] x 0 is an infinite product space with a 
general element w = ((si, 0 ,..., s nj0 ), (si,i, ■ • •, s n ,i)> • ■ • 5 
and the associated sigma field & = 'P(fl). P(-) is the prob¬ 
ability measure on fl which assigns probabilities consistently 
with the common prior v(-) and the likelihood functions 
£ [n], such that conditioned on 9 the random 
variables £ [n\,t £ W} are independent. Note that the 

observed signals are independent and identically distributed 
over time, and independent across the agents at each epoch 
of time. E[-] is the expectation operator, which represents 
integration with respect to dP(w). Let 9 be the unknown state 
drawn initially by nature. Since signals are generated based 
on 9, we have that 


E 


log 


*(■ 10 ) 


-D kl (ii(- |0)||4(-|0)) <0, 


where the inequality follows from the fact that Dkl (■ || ■), the 
Kullback-Leibler divergence, is always nonnegative l23l . The 
inequality is strict if and only if li(-\9) ^ £i(-\9), i.e. 3s £ Si 
such that £i{s\8) £i(s\9). Note that whenever £i(-\8) = 

£i(-\9) or equivalently Dkl = 0, then the 

two states 9 and 9 are statically indistinguishable to agent i. In 


1 " 

Tie, 9) := -~Y,Dkl (timUim) <0. (1) 

2—1 

We, therefore, have the following characterization. 

Definition 2 (Globally Identifiability). The true state 9 is 
globally identifiable, if 1(9,9) < 0 for all 9 £ 0\{0}. 

We adhere to the following assumptions throughout the 
paper. 

Al. All log-marginals are uniformly bounded such that 

| \ogii(si\9)\ < B for all i £ [n]. Si £ Si, and any 
8 £ 0 . 

A2. The true state is globally identifiable, i.e., we have 
1(9, 9) < 0 for any 9 £ 0\{0}. 

A3. The graph Q is strongly connected, i.e., there exists 
a directed path from any node i £ [n] to any node 

j € [n]. 

Assumption Al implies that every signal has a bounded 
information content. For instance, it holds when the signal 
space is discrete. Assumption A2 guarantees that accumulation 
of likelihoods provides sufficient information to make the true 
state uniquely identifiable from the aggregate observations of 
all agents across the network. Finally, the strong connectiv¬ 
ity (assumption A3) guarantees the information flow in the 
network. We end this section by the following definition l24l . 

Definition 3 (Connectivity). Consider a sequence of directed 
graphs Q t = ([n],£t, A t ) for t £ N, where A t is a stochastic 
matrix. A node i £ [n] is connected to a node j i across an 
interval T C N if there exists a directed path from i to j for 
the directed graph ([n], Ct^pEt)- 

B. Belief Updates 

For each time instant t, let fi, t(') be the probability mass 
function on 0, representing the opinion or belief at time t of 
agent i about the unknown state of the world. The goal is to 
investigate the problem of asymptotic learning, that is each 
agent learning the true realized value 9. The convergence can 
be in the probability or almost sure sense. In this paper, we 
are interested in asymptotic and almost sure characterization 
of learning, formalized as follows. 

Definition 4 (Learning). An agent i £ [n] learns the true state 
9 asymptotically, if t (9) —> 1, W-almost surely. 





At t = 0, the value 9 = 8 is realized, and each agent i £ [n] 
forms an initial Bayesian opinion about the value of 

9. Given the signal s^o, and using Bayesian update for each 
agent i £ [n], her initial belief in terms of the observed signal 
s J)0 is given by. 


Pi, oft) = 


iy(9)£i(s ii0 \8) 


ye g 0. 


Eeee^WKol 0 ) 

At any t £ N, agent i uses the following update rule to 
calculate <p i t (&). 


0i,tW = 1 ( 0 ) + log^»(s i)t |0), (2) 

5 =1 


for any 6 £ 0, where Q t is a real n x n matrix (possibly 
random and time varying) and cj> i0 (8) = 0 by convention. 
Then she updates her belief /r, t (O ') as 


Mi.tW 


Mi,o(g)e 0 ^ (g) 


(3) 


for any 9 £ 0. In section IIII-BI we shall describe in detail 
the switching strategy under which Qt evolves. If there is 
no communication among agents, we have Q t = /„. Hence, 
each agent i observes the realized value of s lt , calculates the 
likelihood £i(s iyt \8) for any 8 £ 0, and forms an opinion using 
the Bayes’ rule 

B f/\\ Mi,t-l(0Ki(Si,t|0) 

uft\. 9 ) =- : -=-—, ( 4 ) 

Ee 6 eMi,f-i(^i(si > t|0) 

where fi it _ 1 f) is calculated using ( 0 . Alternatively, at any 
time t that the Bayes’ update based on the private signal 
Si >t does not provide enough information (on which we 
elaborate in section ITTI-Bb . agent i switches to a non-Bayesian 
update, incorporating her neighboring beliefs but only for 
that particular unit of time t. Collecting log-likelihoods from 
her neighborhood, agent i £ [n] averages the local data by 
performing 0 with [Q t ],y = [P]ij and uses the resultant 
</>j t (8) in 0 to obtain a refined but non-Bayesian opinion 
fi i t f). One can view the learning rules 0 and 0 for each 
agent i, as repeated Bayesian updates in an infinite sequence 
of contiguous, nonempty and bounded time-intervals. At the 
outset of each interval, the agent’s prior is derived based on 
averaging the local information from her neighbors, while 
during the interval there is no communication and the agent 
performs successive Bayesian updates based on her private 
signals. Writing the matrix form of 0 , it can be verified (see 
Lemma 3 in ETI ) that 


«*) = ££ 


T=0 j=l 


n 


P= 0 


log^-(sj iT |0). (5) 


Finally, note that choosing Q t at each time t based on a 
gossip protocol reduces the setting to 1211 . while Q t = P re¬ 
covers the model considered in IT8l . In both cases convergence 


of beliefs occurs by incurring the cost of communicating at 
every round. 

III. Main Results 

In this section, we propose the switching rule based on 
which the (possibly random and time varying) matrix Q t in 
0 is chosen. The rule characterizes the dichotomy between 
the non-communicative Bayesian and communicative non- 
Bayesian regime. We shall prove that all agents learn the 
truth efficiently under this protocol. The switching rule, as 
we describe next, occurs based on the quality of information 
that private signals offer. 

A. Characterizing the Class of Informative Signals 

An informative signal is one that substantially influences 
an agent’s opinion. Here we propose the total variation 
distance between /rf t (-) an d Mit-i(') as the measure of 
informativeness for a private signal In particular, the 
private signal Sj >t is informative for agent i at time t if 
\\P?A') ~ Pi,t- i(')IItv > t, where 0 < r < 1 is a given 
threshold. 

Example 1. Informative Signals in a Binary World 

Consider the case where 0 = {1,2}, and the true state 
is 8 = 1. Define = fi it ( 2) as the mass assigned to the 
false state by agent i at time t. For the case of binary state 
space considered here, the evolution of each agent’s beliefs is 
uniquely characterized by that of e,; t and the focus of interest 
is therefore to have e^t converge to zero almost surely. 

Let r(sj ) t) := |l)/£,(sj i t|2) be the likelihood ratio 

under signal Sj it . To investigate the conditions for informa¬ 
tiveness on the private signals, we start by simplifying the 
expression for ||— Mi,t-i( - )ll'zv as follows: 

|| A*i,t (') ~ Mi,t-i(')|| T y = 2 11 Mi,* (') — Mj,t-i(')|| 1 

_ e i,t -1 (1 ~ G,t-l) |lj( s z,f |1) ~ 1 2) | 

(1 — e i,t-l)^i( s ?:,t|l) + £i,t— l^i( s i,t|2) 

= ~ e M -i)|r( s M ) - l| 

(1 - £i,t-l)r{Si,t) +€i,t-l 

To investigate the informativeness condition ||/xf t (-) — 
H i (_i(-)I|tu ^ r, we distinguish two cases r(sj i t) ^ 1 and 
r(s itt ) < 1. For r(s ijt ) > 1, we get 

II Pi,t(’) ~ Pi,t-i(-)\\TV > t 
e i,t-l(l ~ e i,t-t)( r ( s i,t) ~~ l) ; 

/ tei t~i -(- Cj t—i(1 — e, i-i) 

r Kt) > - f. -1-7T-T> 

e i,t- l(l - - t(1 - 

provided that e^t-i > r; otherwise when < r no 

signal with a likelihood ratio r( s^t) ^ 1 will be regarded 
as informative. In other words, for an agent whose belief is 
already sufficiently close to the truth such likely signals are 
not surprising. 

On the other hand, for r(s l t ) < 1 we have. 















Il/*ft(-) - /*i,t-i(OII TV>r^ 

e <,t-l(l ~ c M-i)(l ^ KM) _ _ 

(1 — CM-l) r ( s *>t) + e *>t-l ^ 

K S i,t) < -7T-V—-7t-\> 

e i,t- l(l ~ e M-l) + 7(1 - €»,(—l) 

when e^t-i < 1 — r; however, an agent whose belief satisfies 
€i t t -1 > 1 — r has become almost certain on a falsity; whence 
she finds no signal with r surprising or informative. 

The preceding conditions characterize the criterion under 
which agent i regards an observation s, j as informative, for 
a binary state space 0 = {1, 2}. ■ 

11. The Switching Rule 

Based on the characterization of the informative signals in 
the previous section, we now introduce a switching strategy. At 
each epoch t, any agent i £ [n] that receives an uninformative 
private signal s i t exchanges her log-marginal with all her 
neighbors j £ Aft). On the other hand, if s l t is informative 
for agent i, but a set Ait C jV('i ) of neighboring agents request 
for information exchange (i.e. signal s J t is not informative for 
neighbor j, for all j £ A it), then agent i exchanges her log- 
marginal only with those particular neighbors j £ Ait, who 
are requesting it (and have received uninformative signals). 
Therefore, the communication is bidirectional, and we have 
[Qt]ij = [Qt]ji = [P]ji, whenever any or both of the 
agents i and j have received uninformative signals. Moreover, 
[Qt]« = 1 — Y2jeM t Vi. In particular, whenever all 

private signals are informative, agents stick to their Bayesian 
updates ©, and Q t = Accordingly, at each time t the 
weighting matrix Q f which appeared in (0 is a symmetric and 
doubly stochastic matrix, and we have the following switching 
rule for any t £ W: 

Switching Rule : Given r > 0, for any i £ [n] that satisfies 
llMft(') -i(')IItv < T > the i-th column and row of Q t 

take the values of the i-th column and row of the symmetric 
matrix P. Then, the diagonal elements of Q t are filled such 
that the matrix is doubly stochastic. 

Before shifting focus to the convergence analysis under the 
proposed rule, we note that with r = 1 all signals will be 
considered uninformative to all agents at every epoch of time; 
hence, at every time step agents choose to communicate, Q t = 
P Vi, and they learn the truth exponentially fast IfTSTl . However, 
the learning occurs under an all-time communication protocol, 
which is inefficient when communication is costly. We shall 
demonstrate that the same learning quality can be achieved 
through the proposed switching rule, while incurring only a 
few rounds of communications. 

C. Consensus on the True State 

We now state the technical results of the paper, and provide 
the proofs in Appendix. The following lemma concerns the 
behavior of agents in the Bayesian regime. In particular, it 
guarantees that with probability one, if the switching condition 


is satisfied at some time ti, there exists a t 2 > ti at which the 
switching condition is satisfied again. Furthermore, the length 
of interval t 2 — ti is finite almost surely. 

Lemma 1 (Bayesian Learning). Let the log-marginals be 
bounded (assumption Al). Assume that agent i £ [n] is 
allowed to follow the Bayesian update 0 after some time 
t, i.e. pf t (9) = t (9) for any 9 £ 0 and t > t. We then 

have 

>o, v<?£ e\Oi(0), 

almost surely. 

Lemma Q] simply implies that the switching condition 

-i(')llrv <t is satisfied for all agents following 
a finite (but random) number of iterations. We also state the 
following proposition (using our notation) from lf24l to invoke 
later in the analysis. 

Proposition 1. Consider a sequence of directed graphs Q t = 
([n], £t, A t ) for t £ N where A t is a stochastic matrix. Assume 
the existence of real numbers <5 max > <5 m in > 0 such that 
<5min < [Ath < d m ax for any ( i,j ) £ £ t . Assume in addition 
that the graph Q t is bidirectional for any t £ N. If for all 
to £ N there is a node connected to all other nodes across 
[to, 00 ), then the left product A t A t -i---Ai converges to a 
limit. 

We use the previous technical results to prove that under 
the proposed switching algorithm, all agents learn the truth, 
asymptotically and almost surely. 

Theorem 1 (Learning in Switching Regimes). Let the bound 
on log-marginals (assumption Al), global identifiability of the 
true state (assumption A2), and strong connectivity of the 
netw’ork (assumption A3) hold. Then, following the updates 
in © and © using the switching rule in (Illl-Bb . all agents 
learn the truth exponentially fast with an asymptotic rate given 
by min §^{-1(9,9)} > 0. 

Theorem [T] captures the trade-off between communication 
and informativeness of private signals. More specifically, pri¬ 
vate signals do not provide each agent with adequate infor¬ 
mation to learn the true state. Hence, agents require other 
signals dispersed throughout the network, which highlights 
the importance of communication. On the other hand, all-time 
communication is unnecessary since agents might only need a 
handful of interactions to augment their imperfect observations 
with those of their neighbors. 

IV. Numerical Experiments 

In this section, we exemplify the efficiency of the method 
using synthetic data. We generate a network of n = 15 agents 
that aim to recover the true state 9 = 9± among m = 16 
possible states of the world. The signals are binary digits, i.e., 
s i,t £ {0,1} at each time t. For each agent i £ [n], only 
state i + 1 is not observationally equivalent to the true state 
9\. This implies that 0 \ Oi(9 \) = {#i+i} which results in 
n" =1 Oi(0i) = {0i} to guarantee global identifiability of the 






V. Concluding Remarks 



Iteration 


In this paper we analyzed the problem of learning for a 
group of agents who try to infer an unknown state of the world. 
Agents rely on their private signals to perform a Bayesian 
update. However, private observations of a single agent may 
not provide sufficient information to identify the truth. Any 
time that private signals of agents lack adequate information, 
they engage in bidirectional communications with each other 
to benefit from side observations. We showed that under the 
proposed algorithm agents learn the true state asymptotically 
almost surely while dramatically saving on their communi¬ 
cation budgets. Our future work focuses on the advancement 
of the proposed formulation by deriving optimality results in 
terms of the communication cost and convergence speed. This 
would in turn allow us to design an optimal informativeness 
threshold in the proposed switching strategies. 


Fig. 1. The evolution of the belief on the true state for all agents in 
the network. Agents avoid all-time information exchange using the proposed 
switching rule, and eventually learn the truth. 



Iteration 


Fig. 2. The comparison of belief evolution for a randomly selected agent 
in the network. The blue curve is generated under the algorithm presented in 
this work, while the green one is based on the scheme in tu 


true state. We set the threshold such that log 10 r = —17 for our 
switching rule and perform the updates (0 and <[3j for 1000 
iterations. In Fig. 0 we see that all agents reach consensus on 
the true state almost surely. 

We now turn to compare the efficiency of the algorithm 
versus its counterpart in ED- Fig- [2] represents the belief 
evolution under both algorithms for a randomly selected agent 
in the network. We observe that both algorithms converge; 
however, our proposed algorithm outperforms the one in 
ED in terms of efficiency. The selected agent involves in 
interactions only 41 times in 1000 rounds. Therefore, the 
communication load simply reduces to 4.1% comparing to the 
green curve, which proves a significant improvement. 


Appendix : Proofs 

Proof of Lemma 0 Given the hypothesis, agent i follows 
the Bayesian update after t, and we have 


Pi A 6 ) = Pi A 9 ') = 


Pi,t-i(8Ai(. s i,t\0) 

See© Pi,t-i(A^i( s i,t\ 9 ) 


for any £ 0 and t > t. Recalling that 9 denotes the true 
state, we can write for any t > t. 


log 


Pi A 9 ) 

Pi A 9 ) 


log 


Pu- 1 ( 0 ) 
Pi,t—1( 9 ) 


T log 




( 6 ) 


Therefore, for any 9 G OfO), we have 


PiA 9 ) _ PiA 9 ) 

PiA°) PiA°y 

for any t > t since in © the likelihood ratio is one, and 
log = 0 by definition of observationally equivalent 

states in 0 On the other hand, for any 6 G 0 \ 0, (0) 
simplifying © and dividing by t, we obtain 



PiA 9 ) 

PiA 9 ) 



PiA 9 ) 

PiA 9 ) 


i 

t 


lo s 

T=f+1 


ti(Sj,r\ 9 ) 

ti( s iA 9 ) 


E 


log 


W) 

U-\0) 


= ~D kl (tf- |0)||4(-|0)) <0, 


almost surely by the Strong Law of Large Numbers (SLLN). 
Note that since the signals are i.i.d. over time and the 
log-marginals are bounded (assumption Al), SLLN could 
be applied. The above entails that —> 0 for any 

9 G 0 \ Of9), and thereby completing the proof. ■ 


Proof of Theorem 0 Fix any time to G N. When an 
agent uses Bayes’ rule for t > to, in view of Lemma 0 
the condition ||/uf t (-) — Pi t-\{')\Wv < T will be satisfied 
in a finite (random) time due to almost sure convergence of 














































































Bayes’ rule. Therefore, all neighboring agents will eventually 
communicate with each other in the interval [fo,oo). On the 
other hand, the underlying graph Q is strongly connected by 
assumption A3; hence, all the conditions of Proposition [j] 
are satisfied and the left product QfQt-i • • • Qi has a limit, 
and since the matrices in the sequence {Qt}^ are doubly 
stochastic by the switching rule in (lIII-Bl i. we get 


t -1 

IlQ 

p=0 


t-p 




(7) 


almost surely. Recalling ([5]), we can write 

t n [t—l—T 

n Q‘-p 

p=0 


‘■M^EE 

T— 0 j —1 

t n 


t 


log tj(sj, T \9) 


= ~:^2J2 lo S^ji.Sj, T \e)+e itt , 


nt 


( 8 ) 


T=0 j=1 


where 




« = )EE 


T=0 j=l \ 


n Q*-p 

P=o 


- - ) log^(Si,r|0). 


Since the log-marginals are bounded (assumption Al), in view 
of 0 we get 


i e «iif EE 

T —o j = 1 

—>0, 


n 

P=o 


1 

n 


(9) 


as t —>• oo, since Cesaro mean preserves the limit. Also, 
applying SLLN we have 


1 

nt ■ 


IE lo g« s i^) — y n [ log ^'l^)J ’ 

T—0j=l j—1 

almost surely. Combining above with ® and 0 and recalling 
the definition of X(6,9) in (0. we derive 


almost surely, which guarantees that 


1(6,0), 


( 10 ) 




( 11 ) 


for any 6 £ 0 \ {0}, since 1(0,9) < 0 due to global 
identifiability of 6 (assumption A2). Now observe that 


A h,t(9) = 


^,o( 9 )' 




Eeee Mi,o(%^' t(S) 

1 


i i v- a 

1 ' 2 -^ 0 e 0 \{ 0 } t l i,o\ a ) e 


■ ( 12 ) 


Taking the limit and using CD, the proof of convergence 
follows immediately, and per this convergence is 

exponentially fast with the asymptotic rate min g^ g {—I(0, 0)} 
corresponding to the slowest vanishing summand in the 


denominator of (IT2l i. 
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